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Abstract 

Cryptic genetic sequences have attenuated effects on phenotypes. In the classic view, relaxed selection 
allows cryptic genetic diversity to build up across individuals in a population, providing alleles that may 
later contribute to adaptation when co-opted - e.g. following a mutation increasing expression from 
a low, attenuated baseline. This view is described, for example, by the metaphor of the spread of a 
population across a neutral network in genotype space. As an alternative view, consider the fact that 
most phenotypic traits are affected by multiple sequences, including cryptic ones. Even in a strictly clonal 
population, the co-option of cryptic sequences at different loci may have different phenotypic effects 
and offer the population multiple adaptive possibilities. Here, we model the evolution of quantitative 
phenotypic characters encoded by cryptic sequences, and compare the relative contributions of genetic 
diversity and of variation across sites to the phenotypic potential of a population. We show that most of the 
phenotypic variation accessible through co-option would exist even in populations with no polymorphism. 
This is made possible by a history of compensatory evolution, whereby the phenotypic effect of a cryptic 
mutation at one site was balanced by mutations elsewhere in the genome, leading to a diversity of cryptic 
effect sizes across sites rather than across individuals. Cryptic sequences might accelerate adaptation 
and facilitate large phenotypic changes even in the absence of genetic diversity, as traditionally defined 
in terms of alternative alleles. 



Introduction 



Populations that contain or produce a greater range of heritable phenotypic variants are more likely to 
adapt to novel environments. However, "new" phenotypes often have low fitness in the ancestral envi- 
ronment, limiting populations' ability to accumulate potentially adaptive genetic diversity. This problem 
is partly resolved when genetic varia tion is hidden at first, meaning that its phenotypic effects are at- 
tenuated ( Gibson and Dworkin . 20041 ). Hidden variation accumulates in the ancestral environment due to 
relaxed selection. The full, unattenuated effects of such variants can later be revealed by single mutations, 
by recombination into a different genetic background, or by stress - respo nsive developmental mechanisms 
(jGibson and Dworkinl . 120041 : lHavden et all . l201ll : buveau and Felixl . 120121 ) . 

Hidden variation can sometimes be tracked down to cryptic genetic sequences, i.e. to genes or por- 
tions of genes whose current effects on phenotype are attenuated relative to the magnitude of their possible 
effects. Attenuation ma y occur, for example, when a sequence is only rarely transcribed or translated 
dRaion and Masell . l201ll h With such low expression levels and hence such weak selection, cryptic sequences 
are more prone to accumulate mutations than are sequences whose effects are not attenuated. Cryptic se- 
quences nevertheless maintain the capacity for larger effect later, if and when attenuation is lost, e.g. when 
they become constitutively expressed rather than expressed only occasionally. When cryptic sequences have 
mutated over long periods of time, their co-option can result in large phenotypic changes and allow for inno^ 



vatiq ns that would otherwise occur on very large timescales, or might even not occur at all ([Whitehead et al 
20081 ). 



1 



Here we explore an abstract model of cryptic sequences, their attenuation, accumulation and eventual 
potential co-option. Our model is inspired by the concrete example of cryptic DNA sequences that are 
only rarely translated into proteins. These underexpressed sequences can be whole genes or parts of genes. 
For example, sequences in the 3' untranslated regio n (3'UTR) of a g e ne ar e only expressed when a stop 



codon is misread, resulting in an elongated protein (I Raj on and Masell . l2nilh . The frequency of elongated 
proteins in the cell - and hence their overall phenotypic effect - is normally small. Co-option occurs if 
a mutation changes the stop codon into a sense codon, so the 3'UTR becomes constitutively expressed . 
This has happened many t imes during the evolutio nary history of Sac c harom yces (jGiacomelli et al.1 . 120071 ) . 
rodents (jGiacomelli et all 120071 ) and prokaryotes ( Vakhrush eva et all l201lh. Introns can similarly be co- 



opted when a m utation alters splicing efficiency ( Modrek and Led . 12002 ; Kondrashov and Koonin . 20031 ; 



Lee et alllioii ). Similarly, constitutively down-regulated genes in regulatory networks can act as cryptic 



sequences. Co-option in this case can oc cur through mutations within a gene's regulatory region, whi ch may 
lead to large increases in its expression (|Tirosh et all 120091 : ICheung et all l2O10l : iTirosh et all l2O10h . 

There are other sequences that yield abundant gene products, but are nevertheless cryptic in the 



(Scare et al.. 


2004; 


Hansen. 


2006; 



co- 



option in this case may occur via mutations modifying epistasis. For example, mutations in two genes 
explain most of the evolution of l ight coloration in ol dfield mice after the colonization of a new sandy habi- 
tat in the gulf coast of Florida ( Steiner et al. . 20071 ). The interaction between a ligand (encoded by the 
Agouti gene) and its receptor (Mclr) controls the color of the dorsal coat in mice. Mclr is found at the 
surface of pigment-producing cells and governs the relative abundance of a light pigment. In the ancestral 
mouse population living on a dark soil, an allele of Agouti mas ks the effect of a mut a tion o f Mclr, which 
would otherwise increase the production of the light pigment ( Barrett and Schluter . 20081 ). A mutation 
suppressing the repressing action of Agouti increases the production o f this pigment, resulting in a lighter 



color favored by selection on the lighter soil of the coast (jSteiner et all 120071 ) . 



Clonal population with a rich mutational neighborhood. The population 
only has a single genotype - it occupies a single node in the genotype 
network - but it can reach a diverse array of potentially adaptive 
phenotypes through new mutations. 



Diverse population with poor mutational neighborhoods. The population 
is genetically diverse as it occupies several nodes in the genotype network, 
but each genotype can only reach a small subset of all possible new 
phenotypes through mutation. The population as a whole can reach many 
new phenotypes, but this ability would be lost were the population to 
become clonal. 
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Figure 1. A schematic genotype network where a clonal and a genetically diverse population can access similar sets of 
phenotypes through mutation. Nodes represent genotypes, which may yield different phenotypes represented by different colors 
(including white, which is optimal in the current environment). Connections between nodes represent possible mutations. The 
genotypes present in the two example populations are represented by grey areas. The population on the left is clonal, so 
its phenotypic potential (the distribution of new phenotypes accessible through mutations) corresponds to the neighborhood 
richness of the unique genotype. The population on the right is diverse, but each genotype has relatively poor neighborhood 
richness (genotypes can access 1-2 new phenotypes through 1 mutation). In this population, the phenotypic potential depends 
on neighborhood richness and on genetic diversity. 
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Here we investigate the adaptive potential made possible through the co-option of cryptic genetic se- 
quences. Adaptive potential is often illu strated by geno type networks, where nodes represent genotypes and 
edges represent single mutational steps (IWagnerl. 120051. see Fig. 1 ) . The number of new phenotypes accessi- 



ble by a single mutation has two components ( Masel and Trotter . 2O10l ; Wagner . 2011 ). First, a population 



that occupies many nodes on the network of possible genotypes - i.e. that has high genetic diversity - may 
be able to access different phenotypes from each of the genotypes to which it has already spread (Fig. 1 - 
right part of the netwo r k). This may increase the speed of adaptation when "mutational neighborhoods" 
are poor (|Draghi et all l20ld : lHavden et all l201ll ). that is, when each genotype can access very few new 
phenotypes through mutation. Second, even a clonal population - which occupies a single node - may have 
a high adaptive potential if its mutational neighborhood contains diverse phenotypes (Fig. 1 - left part of 
the network). 

Genotype network models, including the special case of a "neutral network", are most often used to 
represent single proteins or single RNA sequences, where the genotype of interest can mutate into few 
readily accessible alternative phenotypes. These single locus genotypes generally have poor mutational 
neighborhoods. In contrast, quan t itative phenotypes are generally affect e d by multiple genes (jVisscherl . 



neignbornoods. in contrast, quan t itative pnenotypes are generally attect e d by multiple genes 1[ V isscner 
20081 : lEhrenreich et all I2OI0L 120121 : iFlint and Mackavl. [jfloj buckler et all 1200911. incl uding multiple cryp 
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tic sequences (jLauter and Doeblevl . 12002 ; iGibson and Dworkinl . 12004 ; iRaion and Masell . l201ll ) . When quan- 



titative traits are considered, some genotypes may have access to a great variety of potentially adaptive 
quantitative phenotypes through the co-option of different cryptic sequences (Fig. 2). With such a rich mu- 
tational neighborhood already available, genetic diversity might not make a very larger further contribution 
to the diversity of phenotypes accessible by co-option. 

Here we calculate the relative contributions of neighborhood richness and genetic diversity under a range 
of realistic parameter values. We find that neighborhood richness dominates for phenotypes influenced 
by several cryptic sequences. This is because these cryptic sequences can have substantial effect sizes, 
due to a history of compensatory evolution. Co-option can convert cryptic attenuated effects into large 
phenotypic changes and facilitate major innovations, which occur more quickly with co-option than with 
regular mutations to cryptic or non-cryptic sequences. Because this adaptive potential resides in genomes 
instead of populations, it is unaffected by losses of genetic variation, and may be particularly important 
when such losses are frequent. 
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Figure 2. A schematic genotype network where a clonal and 
a genetically diverse population can access similar sets of phe- 
notypes through mutation. Nodes represent genotypes, which 
may yield different phenotypes represented by different colors 
(including white, which is optimal in the current environment). 
Connections between nodes represent possible mutations. The 
genotypes present in the two example populations are repre- 
sented by grey areas. The population on the left is clonal, so its 
phenotypic potential (the distribution of new phenotypes ac- 
cessible through mutations) corresponds to the neighborhood 
richness of the unique genotype. The population on the right 
is diverse, but each genotype has relatively poor neighborhood 
richness (genotypes can access 1-2 new phenotypes through 1 
mutation). In this population, the phenotypic potential de- 
pends on neighborhood richness and on genetic diversity. 
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Methods 



Model. Let the phenotype, a set of C quantitative characters, be determined by L distinct sequences in 
the genome. Each sequence contributes to the C characters and hence specifies a vector in a C-dimensional 
phenotype space. In its non-cryptic state, the I th sequence of genotype j has a quantitative effect /L; c on the 



c character. The phenotypic effect of each cryptic sequence is attenuated by a factor p, so for genotype j 
character c has a value 



Y.P x foe- 



(1) 



l=i 



Mutations occur with probability /i = 1/(N x 100) per nucleotide per generation at the 60 nucleotides 
of each of the L sequences, where N is the population size. This assumption, which keeps constant the 
input of mutations per generatio n , is empiric a lly su pported by the decrease of fx with effective population 
size across species ( Lynch . 20ld ; Sung et al. . 20121 ). A mutation in sequence I changes the genotype j, 
and hence the quantitative effects 0ji r . for all c 6 [1>C]- Most quantitative genetics models assume that 
mutations have a mean effect of (jLande! . Il976h . This assumption causes the probability dist ribution of 
(3ji c to show an unbounded increase in variance over time (Lande, 19761 : Lynch and Gabriell . fl~983T ). To avoid 
this unrealistic outcome, we instead assume that the mutation of sequence I adds to each of the f3j[ c an 
amount sampled independently from a normal distribution with mean — and standard deviation 0.5. 
This introduces a bias in the mean mutational effect, such that whenever a given sequence has a large effect 
size Pji c , mutations will tend to decay this effect size back toward smaller values. Eventually, (3ji c values 
reach a stationary probability distribution, whose variance can be calculated analytically for the special case 
where changes in f3ji c are neutral (dashed line at top of Fig. 3; Raj on and Masel 2011 ). 
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Figure 3. The standard deviation of the phenotypic effects, 
sd(f3ji c ), increases as p decreases and sequences become more 
cryptic. Variation across sites accumulates through compen- 
satory evolution, which occurs at a speed that depends on the 
strength of selection. Accordingly, when selection is strong (p 
is high, crypticity is low) sd(/3ji c ) remains low regardless of the 
simulation time t, and is unlikely to reach high values at evo- 
lutionarily relevant timescales. On the other hand, for highly 
cryptic sequences where selection is very weak, sd((3ji c ) quickly 
reaches high values close to the expected value in a neutral 
model , calculated according to eq. (S3) in iRaion and Masel 
(|201lh (dashed line). Only at intermediate levels of crypticity 
does the variation across sites increase noticeably with evolu- 
tionary time. The average of sd((3ji c ) was calculated across 
traits in a given individual, then across individuals and across 
independently evolved populations. 



Simulations begin with all /3j[ c values equal to zero. Given infinite time for compensatory evolution 
to occur, they will asymptote to the neutral stationary probability distribution, even in the absence of 
crypticity. But evolutionary timescales, however long, are far from infinite. In Figure 3 we show that even 
over quite long evolutionary timescales, the variance in f3ji c values remains far from the asymptote. We see 
a switchdike dependence on a key model parameter, where low levels of crypticity lead to little evolutionary 
change in (3ji c values, while high levels of crypticity lead to (3ji c values of a magnitude not far from the 
neutral expectation. This pattern is robust to the precise duration of the evolutionary simulations, with 
the cutoff level of crypticity for this transition depending only modestly on the number of generations. In 
subsequent figures, we set the number of generations equal to 5 x 10 . 
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The fitness uj(J) of genotype j is a product of C Gaussians, each a function of character value Xj c , with 
optimum and variance 1. In other words, fitness is a multivariate Gaussian with optimum (0, ...,0). For 
each replicate simulation in this ancestral selective environment, we simulated an asexual Wright-Fisher 
process over 5 x 10 7 generations, starting with a clonal population with all /3ji c = 0. The population is 
replaced each generation by sampling, with the probability that a given new individual has genotype j equal 
to: 

i 

where nj denotes the number of individuals with genotype j and the sum is made over all the genotypes in 
the population. Each of the sampled individuals is then subjected to possible mutation: since mutational 
effects lie on a continuum, we have infinite alleles at each locus, and each mutation introduces a new 
genotype in the population. Our model maps the discrete set of genotypes present at any moment in time 
to a continuous C-dimensional phenotypic space. 

Consider an adaptive challenge at the end of this simulated evolution. By co-opting a cryptic sequence, 
a new phenotype can be generated. The co-option of the I th sequence in genotype j changes the value of 
character Xj c to: 

ejic = x jc + (1 - p) x p jlc . (3) 

The coordinates e^. = (eju, &jic) define a point in phenotype space formed by all C characters, 
corresponding to the phenotype that can be reached through the co-option of sequence I in genotype j. At 
the end of each replicate simulation, we calculated the average Euclidean distance do between a pair of 
individuals each generated via the co-option of one sequence in the same genotype j. We also calculated 
the average distance dp between two individuals of any genotype in the evolved population, again with one 
sequence co-opted per individual (see Appendix). 



Simulations with a new optimum. We measured evolvability by sampling fixation events arising from 
co-option and regular mutants. We used C = 3 and generated 20 new phenotypic optima uniformly dis- 
tributed on a sphere of diameter d (R script available upon request); d is thus the distance to the new 
optimum. 

We sampled 1000 individuals per population at the end of the simulated evolutionary process described in 
the previous section. Each of these individuals generated two mutants, one with a co-option mutation and 
one with a regular mutation. The sequence affected by these mutations was chosen randomly among the L 
loci. Each evolved population was confronted with each new optimum. For each regular or co-opted mutant, 
we calculated the fixation probability based on the selection coefficient, which we calculated as: 

s (mutant) = u; ( mutant ) _i u\ 

< U) > 

where w(mutant) is the mutant fitness and < uj > the population mean fitness. The probability of fixation 
for a haploid population of size N is then calculated as: 

j g— 2s(mutant) 

Pflx(mutant) = - _ e _ 2Ns(mutant) • (5) 

To simulate the expected number of trials until fixation, we repeatedly sampled one of the 1000 mutants 
from a given population, and then allowed it to fix with probability pfi x , until the first successful fixation 
occurred. The process was repeated 100 times per evolved population for each of the 20 new optima. At 
the end of each repetition, we recorded the number of trials before fixation - we report the median of this 
number in Figs. 7A and 8A. To reduce computation time, we used one set of mutants per evolved population, 
resampling from them and their fixation probabilities until the first success. Also for time efficiency, the 
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process was stopped if no success was obtained after 10 5 trials. This did not affect our results based on the 
median. For each mutant that fixed in our simulations, we also recorded the distance to the new optimum 
- we report the average of this distance in Figs. 7B and 8B. 

Results 

Compensatory evolution creates phenotypically rich mutational neighborhoods available through 
co-option. We simulated the evolution of genotypes with single or multiple cryptic sequences contribut- 
ing to a set of traits, over 50 million generations. At the end of this simulated evolution, we quantified 
the phenotypic effect of co-opting one cryptic sequence. A population may have access to a variety of phe- 
notypes through co-option. We quantified this phenotypic richness as the average Euclidean distance, in 
the phenotypic space formed by the C characters, between two individuals subject to independent cryptic 
sequence co-option events (see Appendix). 

In the one- locus version of our model (Fig. 4, L = 1), the phenotypic richness accessible to the popu- 
lation as a whole (dp) increases when the expressivity of cryptic sequences, p, decreases. This is because 
accumulated genetic diversity - the variation in the effect size of a given sequence, across individuals in 
the population - increases as sequences become more cryptic. With one locus, the neighborhood richness 
available via co-option is trivially equal to zero, since the only way to sample two mutational neighbors of 
the same genotype is to sample the same co-option event, at the only available locus, twice. Neighborhood 
richness values can be higher only when multiple different cryptic sequences are available for co-option. 
Accordingly, in the multi-locus model, both neighborhood richness (quantified by do) and genetic diversity 
contribute to the phenotypic potential of the population (dp), dp includes both neighborhood richness and 
genetic diversity, so it is always higher than do, which captures only the former. The ratio da /dp quantifies 
their relative contributions. This ratio approaches 1 when p is lower than 0.1 (Fig. 4, L = 10). This is a 
very reasonable parameter range for a cryptic sequence, meaning that most of the phenotypic variability of 
the population resides in neighborhood richness, not genetic diversity. 



Figure 4. Neighborhood richness d,G explains most of the 
phenotypic potential of a population dp when sequences are 
strongly cryptic. Top: When L = 10, both the ratio da/dp and 
dp increase when p decreases and sequences are more cryptic. 
Crypticity needs to be more complete in order to drive dp up 
when L = 1, a situation in which da always equals because 
an increase of neighborhood richness via intragenomic diver- 
sity is impossible, dp remains low when L = 1, even when the 
effective mutation rate is increased tenfold ('high n\ blue dia- 
monds), such that the mutation rate per genome equals that for 
L = 10. Middle: The large ratio dc /dp cannot be explained by 
a lack of genetic diversity in the population. Pairwise diversity 
was quantified as the probability that two individuals have dif- 
ferent genotypes, calculated as 1 — ^ i /; , with fi the frequency 
of genotype i. Bottom: k is a metric of compensatory evolu- 
tion (see text), calculated as the normalized difference between 
the mean variance in ^ ( /3ji c , expected if loci had evolved in- 
dependently, and the mean variance observed in simulations. 
Positive values of k indicate compensatory evolution. Param- 
eter values: iV = 10 5 , fi = 10~ 7 , C — 3. Results are averaged 
over 200 (L — 1) or 49 (L = 10) simulations. The bars in the 
top and center panels represent the 0.25 and 0.75 quantiles in 
the distribution of da /dp across replicate simulations. 
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The predominant role of neighborhood richness could be explained by a limited number of genotypes per 
population, each with high genetic potential. To examine this possibility, we quantified pairwise diversity as 
the probability that two individuals share the same genotype (Fig. 4). This measure of diversity increases 
as p decreases, and is high when do /dp is high. High values of this ratio are therefore not due to a lack of 
genetic polymorphism. 

In our model, the mutation rate per character increases with the number of sequences L. Although this 
assumption seems reasonable, it might provide an artefactually lower value of dp when L = 1. We therefore 
ran simulations with L = 1 with a 10-fold higher mutation rate, for comparison to the L = 10 case in Fig. 
4, holding constant the mutation rate per genome (instead of per cryptic sequence). Pairwise diversity is 
superimposable for different values of L. With the higher mutation rate, dp increases significantly for L = 1. 
Nevertheless, dp for L = 1 remains a small fraction of the value obtained when L = 10 (Figure 4 top panel, 
'high jLi'), so the increased adaptive potential in the multi-locus case cannot be explained by more frequent 
mutations alone. 

Compare total population variability dp in the one- locus model to the ratio do /dp in the 10- locus 
case (Fig. 4). Neighborhood richness represented by the ratio builds- up even at levels of crypticity that 
are too low (i.e. p is too high) to allow substantial one-locus genetic diversity. Something is happening 
in the multi-locus case that is not a simple extrapolation of the one-locus case. We attribute this to 
compensatory evolution, whereby the fit ness decrease associated with a mutation is compensated by one or 
several mutations at other sites (Fig. 5; iPoon and Ottol . l2000l : IPoon and Chaol . 120051 : iRokvta et all I2OO2I : 



Harcombe et all 120091 : iMeer et all l201~(f l . When a deleterious variant is cryptic, relaxed selection increases 



(Kimural. 1985: 


Phillips. 


I— 1 


Haas, 



substitutions are therefore expected to fix more readily at low values of p, when selection on cryptic sequences 
is relaxed. This is consistent with observed high values of do /dp as well as da when p is small, under the 
influence of compensatory evolution. 




Figure 5. The compensatory evolution of multilocus characters increases neighborhood richness. A mutation in a sequence 
controlling color (see Fig. 2) likely moves the phenotype away from the environmental optimum (1). Before this variant is 
eliminated, a backward mutation (2, unlikely) or a compensatory mutation (3) may occur and cancel its phenotypic effect. 
After the compensatory pair has fixed, a greater diversity of phenotypes can be accessed through co-option. 

Compensatory evolution means that a locus will evolve an effect size in a different direction to (i.e. 
negatively correlated with) the effect sizes at other loci. We can see this compensation directly, using a 
simple test of independence. Consider vectors of /3-values, both for one locus and for an individual with L 
loci. With independence across loci, the variance across L-loci genotypes is expected to be L times the per- 
locus variance. With compensatory evolution and resulting negative correlation, the observed variance will 
be lower than the expectation under independence. In the bottom panel of Figure 4, we plot the metric k, 
which quantifies the departure of the variance in ^ (3ji c from its expected value under independence. Given 
the analytically known expectation of zero for Y2i /3ji c , for each trait we calculate the observed variance 
as the simple mean of (^ (3ji c ) 2 across all individuals in all simulations. The expected variance under 
independence, given that E(f3ji c ) = 0, equals L times the mean of /3| d across all simulations, individuals, 
and loci, k is the mean, across traits, of the difference between the expected variance in ^ j3ji c and its 
observed value, divided by the former, such that compensatory evolution is detected by positive values of 
k. In the bottom panel of Fig. 4, we see that compensatory evolution indeed occurs at low values of p, 
when selection on cryptic sequences is weak. However, evolution is never completely neutral (i.e. k is never 
equal to 0) even for the most cryptic sequences considered: k decreases only slowly with decreasing p when 
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p < 10" 1 ' 75 , and remains high even when p = 10~ 2,5 . This shows that compensatory evolution is important 
for a wide range of cryptic sequences, not just those near the threshold at which cryptic sequences evolve 
away from their initial values of zero. 

Selection becomes effective for values of the selection coefficient below 1/N. For cryptic sequences, the 
threshold for selection is l/{pN). As expected, p and N have similar effects (Fig. 6A). An increase in 
N, by increasing the effectiveness of selection, also decreases the sojourn time of a deleterious variant. In 
consequence, compensatory evolution becomes less likely when N is large, and da decreases (along with 
d G /dp, Fig. 6A). 




number of characters, C 



Figure 6. The mean proportion of variation due to neighbor- 
hood richness [da/dp) decreases with the population size N 
and when weakly cryptic sequences encode a large number of 
characters. A: da I dp decreases when the product pN exceeds 
a threshold. B: do/dp and dp decrease with C when p = lCP 1 
but not when p = 10~ 2 . The blue dashed lines represent the 
absolute values of dp. Parameter values: L = 10, C = 3 (A), 
N — 10 5 (B). Results are averaged over 49 or 34 simulations 
(the latter when N > 10 s in panel A and C > 5 in panel B). 
The bars represent the 0.25 and 0.75 quantiles in the distribu- 
tion of daj dp across replicate simulations. 



We expect more compensatory combinations to fix when compensatory mutations are more common. 
When the number of characters increases, the probability that a mutation has compensating effects on all 
the appropriate characters becomes vanishingly small. Accordingly, the neighborhood richness do becomes 
a less important part of the total variation present in the population as the number of characters increases, 
given substantial cryptic selection (Fig. 6B, p = lO^ 1 ). However, with greater crypticity (p = 10 -2 ), 
neighborhood richness continues to dominate, even for larger numbers of character dimensions. 



Compensatory evolution increases evolvability. dp gives an easily decomposable but indirect mea- 
sure of evolvability. To estimate population evolvability more directly, we assigned a new optimum phenotype 
at a distance d (see Methods, section "Simulations with a new optimum"). This is a C-dimensional version 
of Fisher's geometric model, where mutations correspond to vectors in C-dimensional phenotype space. 
Evolvability means the ability to generate and fix adaptive mutants. 

For each value of p, we generated a set of mutants from random individuals in each evolved population, 
and calculated their selection coefficients and fixation probabilities. Using Monte-Carlo simulations, we then 
calculated the median number of mutants trialed until one fixed (see Methods). The general trend that we 
observe is that as d gets larger, the median number of trial s approaches an asymptotic value o f 2. This is 
a standard asymptotic result for Fisher's geometric model (jFisherl . [l93ol : iPoon and ottd . boooh : when the 
distance to the optimum is very large relative to the mutational effect size, nearly half the mutations will 
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Figure 7. Co-option increases the potential for adaptation 
to large environmental changes only. A: Waiting time for an 
adaptive fixation scaled according to the number of mutations 
trialed before one fixes - the median number of trials is rep- 
resented. A short waiting time (i.e. a small number of trials) 
indicates high evolvability. Mutations in non-cryptic sequences 
yield a fixation event more rapidly, although the calculations 
do not include the fact that only co-option mutations have 
been pre-screened for strong, uncondit ionally deleterious effects 
ijMasell . I2006I ; iRaion and Masell . l20Hl l. B: Distance to the new 
optimum, following a fixation event. For larger environmental 
changes, one fixed co-option mutation yields a greater advan- 
tage than a regular mutation in either a cryptic or a non-cryptic 
sequence. Parameter values: N = 10 5 , /i = 10~ 7 , C — 3, and 
p = f0~ 2 for cryptic sequences (p = f otherwise). The bars 
represent the 0.25 and 0.75 quantiles in the distribution of the 
number of trials until fixation (panel A) or of the distance to 
the new optimum (panel B). 



be improvements. Regular mutations have moderate effect sizes, so the median number of trials approaches 
2 for intermediates distances to the optimum (Fig. 7A). 

Co-option mutations have large effect sizes. Therefore, they sometimes lead to phenotypic c hanges much 



large r than those required for small-scale adaptation - a phenomenon called "overshooting" (jSellis et al 



201 ih . Consequently, the median number of co-option trials approaches the asymptote expectation of 2 at 



larger values of d, relative to mutations of non-attenuated effects (Fig. 7A). However, when the co-option 
mutations do fix, they bring the phenotype closer to the optimum than regular mutations do (Fig. 7B). 

Consider again our example of cryptic coding sequences in 3'-UTRs. In this case, adaptation can happen 
in the main coding sequence (p = 1) or via co-option; mutations within cryptic 3'-UTR sequences are usually 
ignored as a source of adaptation. In contrast, mutations in other systems (e.g. gene networks with complex 
epistasis and/or genes expressed at constitutively low levels) may frequently be cryptic to a greater or lesser 
extent, with no alternative non-cryptic sequence available for comparison. In Figure 7 (blue circles and 
dashed line) we see that even if they occur reasonably frequently, mutations within cryptic sequences are 
only important - relative to co-option - for very small changes to the optimal phenotype. 

Figure 4A showed how the phenotypic potential of co-option mutations increases with crypticity, with the 
transition occurring at lower crypticity when compensatory evolution occurs (i.e. with more loci). In Figure 
8A we see that the parameter range in which we find high neighborhood richness corresponds to parameter 
range of high evolvability as assayed by our more direct measure. Compensatory evolution occurring at 
low p consistently reduces the number of trials needed before an adaptive fixation. It also brings the new 
phenotype dramatically closer to the new optimum (Fig. 8B). In contrast, mutations of attenuated effect 
contribute less to evolvability (Fig. 8, dashed lines). Mutations in non-cryptic sequences are captured by 
the special case of p = 1 (dashed line, far left). 

Note that if we wanted to infer the extent to which cryptic rather than non-cryptic mutations contribute 
to evolvability, we would also need to know the ratio of mutation rates between co-option and "normal" 
mutations, which will vary between different biological systems. We have calculated the phenotypic potential 
(dp) and evolvability for one value of crypticity (p) at a time, corresponding to one category of sites. 

The primary result of this paper is to infer that if one accepts that co-option mutations might be 
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Figure 8. Higher crypticity leads to higher evolvability 
through co-option. A: The evolvability of co-option mutations, 
measured as the median of the (small) number of mutants that 
need to be trialed before one fixes, mirrors the results of Fig. 
4A. This means that high evolvability tracks high neighbor- 
hood richness driven by compensatory evolution in cryptic se- 
quences. Regular mutations in cryptic sequences are shown for 
comparison. B: Not only do co-option mutants fix more read- 
ily when they reveal more cryptic sequences, they also reach 
phenotypes much closer to the new optimum. Large adap- 
tive phenotypic changes are possible through single co-option 
mutations in the 10-loci case when sufficient variation across 
cryptic sequences has accumulated through compensatory evo- 
lution (i.e. when dp and da/dp are high in Fig. 4A). Same 
parameter values as in Fig. 4; the distance from the old to the 
new optimum d equals 4. The bars represent the 0.25 and 0.75 
quantiles in the distribution of the number of trials until fixa- 
tion (panel A) and of the distance to the new optimum (panel 
B). 



important to evolvability, then their importance is almost independent of the presence of pre-existing genetic 
diversity. This result overturns, for polygenic traits , the preyiously dominant metaphor of evolvability via 
the spread of a population across a neutral network (jWagnerj, 120051 . 120081 : braehi et all l2O10h . We attribute 
this result to compensatory evolution on cryptic sequences while they are cryptic, which leads to a larger 
eventual effect size if and when they are eventually co-opted. This prior compensatory evolution yields 
co-option mutations with larger effect sizes than "normal" mutations, with a larger resulting contribution 
to evolvability. 



Discussion 



Cryptic sequences may exist within a range of genetic architectures, which we illustrate with two represen- 
tative examples. In one example, a gene can epistatically control the effect of other genes, e.g. through 
the regulation of their expression. Each gene in the epistatic network is a potentially cryptic sequence, and 
co-option in this system occurs via a change in epistatic interactions. In our second example (3'-UTRs), 
cryptic sequences have additive effects, and are associated with nearby non-cryptic sequences that also con- 
tribute additively to the same set of phenotypic characters. Co-option converts a cryptic sequence into a 
non-cryptic one. 

In both our examples, the co-option of a cryptic sequence results in larger phenotypic changes when com- 
pared to other classes of mutations. The large size of the change is a consequence of compensatory evolution 
that took place prior to co-option, while the sequence in question was cryptic. T he possibility of selection 



prior to mutation has be e n called the look- ahead effect ([Whitehead et al.l. 120081) . and has been previously 



studied in sexual (iMasel 120061 : iKiml . 120071 ) as well as asexual ((Whitehead et aL . 20081 : iRajon and Mase] 
201 ih populations. 

Many metrics of evolvability measure how frequently potentially adaptive mutants are generated, but 
neglect their magnitude. By allowing access to distant phenotypes in a small number of mutational steps, 
co-option may be an important mechanism for evolutionary innovations that involve large-scale phenotypic 
changes. This evolvability benefit of the co-option of cryptic sequences, which we show in Figure 8B, 
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is in addition to other previously reported advantages (not considered in our model), in particular the 
prior exclusio n of st r ongly deleterious allele s who se expression is unconditionally disadvantageous in all 
environments ( Masell. 2006 ; Raj on and Masel . 2011 Y and the abil i ty to overcome certain forms o f synergistic 
(|Griswold and Masel Eoog ) and antagonistic rtMasell . 1200(1 iKiml . I2OO7I : Iwhitehead et al.l . I2OO8I ) epistasis. 

While our model is reasonably general, it makes important assumptions, upon which our conclusions 
might depend. Some of these assumptions are mathematical conveniences not likely to alter outcomes. More 
significantly, we assume that i) reproduction is asexual, ii) de novo mutations (including mutations that 
co-opt previously cryptic sequences), rather than standing genetic variation alone, sometimes contribute 
to evolvability, iii) that phenotypic traits are affected by more than one cryptic sequence, and iv) that co- 
option mutations in a single gene affect a relatively small number of phenotypic traits. Assumption iii) seems 
unproblematic in the light of numerous QTL and GWAS studies de t ecting large numbers of loci contributing 
to in dividual traits (jVisscherl . I2OO8I : lEhrenreich et all bold . I2OI2I : iFlint and Mackavl . 120091 ; buckler et al 



2009). Assumption iv) is also compatible with recent findings indicating that the number of traits encoded 



by a given gene (pleio tropy) is restricted to a small fraction of the traits measured i n yeast, nematode 
fish, mice an d hum an (jGoh et all 120071 ; IWagner et all . I2OO8I ; iKennev-Hunt et all 120081 : lAlbert et aD. 12008 
20ld ). and organized into modules where genes contribute to similar sets of traits (jWang et al 



Wang et al. 



mg 



2010). Our model describes the evolution of such a module. 



In our model, adaptation proceeds via the co-option of a single cryptic sequence. This assumption seems 
reasonable, but note that other "evolutionary capacitanc e" systems exist where several cryptic sequences 
may be co-opted at once - e.g. the [PSI+] prion i n yeast ( Griswold and Masel 2009 : Torabi and Kruglyak . 
20121 ) or the Rho Terminator in Escherichia coli (jFreddolino et all [2012]) In such cases, the contribution 



of cryptic sequences to adaptation can be more extreme than the situation described here. 

Below we discuss the first two key assumptions - asexual reproduction and adaptation from de novo 
mutations - in more detail. First, consider our assu mption of a s exual reproduction. Our results rely 



strongly on compensatory evolution, and empirical data ( Meer et al. . 2010l ). as well as the simulation results 



presented here, suggest that compensatory evolution is common in the absence of recombination. The key 
issue is whether high rates of compensatory evolution are also expected among cryptic sequences in sexual 
populations. Unfortunately, comparable sexual simulations would require the tracking of a far larger number 
of genotypes, making such simulations computationally inaccessible. However, some heuristic predictions 
can be made, based on previous analytical theory on this topic. Consider the simple example of a segregating 
pair of loci with mutually compensatory phenotypic effects. When recombination breaks up compensatory 
pairings, alleles from rare pairs are likely eliminated. This will initially select against new compensatory allele 
pairs (when their component alleles are rare) but may later favor them if they survive to become common. 
On the other hand, two alleles that would form a compensatory pair but appeared in different individuals 
may sometimes be brought together by recombination. The net result of these effects is complex, but theory 
shows that in the slightly different case where the pair of compensatory alleles is more fit than the original 
genotype, pairs fix more frequently with low recombination, and e qually frequently with either high or zero 



recombination, so long as selection against cryptic alleles is weak ( Weissman et al. . 2010l ). This theoretical 



finding suggests that our results may apply to sexual populations as well. Indeed, empirically, compensatory 
evolution seems to occur despite frequent recombination in sexual species, as sugge sted by QTL mapping 



studies that have reported alleles with opposite phenotyp i c effe cts at different loci (jRieseberg et all 11999 



Brem and Kruglyak . 2005 : Carlborg et al. . 20061 : 



Visscherl . 12008 ) . 



Rare recombination can even facilitate the co-option of cryptic sequences via a different mechanism. 
With facultative sex , recom bination breaks up compensatory combinations and can result in new phenotypes 
(jLvnch and Gabriel Il983l). In this case, co-qption occurs via recombination whereas in our model co-option 
occurs through mutation ( Masel and Trotter . 2010l ). Co-option via recombination requires genetic diversity, 
whereas co-option by mutation as treated here does not. In other words, here we have shown how cryptic 
sequences contribute not only to standing genetic variation, but also to the effects of de novo mutations. 

This brings us to our second major assumption, namely that de novo mutations are sometimes im- 
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port a nt to a daptation. There is support for t his in natural populat ions of garter snakes (IFeldman et al 



20091 ) . mice ( Linnen et al. . 20091 ). Drosop hila (Karasov et al. . 201o1 ) and hu mans (Peter et all 2012), as 



well as experimental populations of mai ze dDurand et all 120121 ) and bacteria (ILenski and Travisand . 



1994; 



Cooper and Lenskil . |201(]I : iBlount et aD . |2J)12j). Note that we do not assume that de novo mutations are 



more important than standing genetic variation, merely that they are important in some instances. 

There are many ways that genetic diversity can be lost. Genetic diversity is eroded by genetic drift 



in small populations ( Willi et al. . 20061 ) . In addition, populations of any size can also suffer fr om losses 



of genetic diversity when an adaptive allele sweeps to fixation and brings linked loci with it ([Gillespie 



200d l200lh. Genetic va riance then needs time to recover before it can be used again for adaptation 



( Le Rouzic and Carlbore . 20071 ). Inbreeding can increase background selection against recessive alleles and 
also eliminate variation at linked loci. When these stochastic processes are stronger than selection against 
cryptic sequences, as captured by small iV in Fig. 6A, neighborhood richness dominates evolutionary po- 
tential. 

It was previously thought that the diversity of phenotypes produced from cryptic variation is lost when 
genetic variance is lost. Without the ability to generate phenotypically diverse variants, populations facing 
new environmental conditions may fail to adapt. Here we have shown that there is still hope: quantitative 
characters are encoded by multiple sequences, each of which can have a different phenotypic effect through 
co-option and facilitate adaptation. 
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Appendix 



Calculation of do and dp. The Euclidean distance between two points in phenotype space, corresponding 
to genotypes j± and 22 with sequences l\ and I2 co-opted, equals 



d 



C 

£ 

c=l 



(6) 



The potential phenotypic range that a given genotype j can access by co-option can be represented by 
the set of L points e^. in the phenotype space, where each point corresponds to the co-option of one cryptic 
sequence. The ability of a genotype to reach new phenotypes by co-option can be summarized by the mean 
distance between two of these points. An individual with genotype j and sequence l\ co-opted is at an 
average distance from another individual with the same genotype and any sequence co-opted: 



(7) 



Therefore, on average, two individuals with the same genotype (assuming they exist in the population), 
each with one sequence co-opted, will be at a distance: 



^ L L 
JjlYYl 



Z 1= 1Z 2 =1 



Note that each distance between two individuals with initial genotype j is averaged over I? possible values, 
which include when the same sequence is co-opted in two different individuals. At the end of a given 
simulation, n gen different genotypes segregate in the population, each in rij copies. Consider sampling two 
individuals at random in order to assess the expected pairwise differences between them following co-option, 
with and without the condition that the two individuals start with the same genotype. An individual with 
genotype j may be compared to rij — 1 individuals that share the same initial genotype, so the total number 
of pairwise distances per genotype equals rij(nj — l)/2. In the whole population, the average distance 
between two individuals with the same initial genotype and one co-option mutation therefore equals: 



rij (rij 



Zi=i/ 2 =i V J 



(9) 



We want to compare this distance do, which represents the potential for phenotypic evolution of any one 
representative genotype in the population, to the comparable distance dp. An individual in the population 
with genotype j\ and with sequence l\ co-opted can be compared to N — 1 other individuals, among which 
1 have the same initial genotype j±. Its average phenotypic distance to any other individual in the 



n ; 



ji 



population with genotype ]\ and with a sequence I2 co-opted thus equals: 

( n J' 2 ( d hh^hh ) ) + ( n ii - !) XI ( d hh^hh 



L x (N - 1) 



32=1,32^31 



h=l 



(10) 



Averaged across all sequences that could be co-opted in the first individual, the mean distance between an 
individual with initial genotype j\ and any other individual in the population, after the co-option of one 
cryptic sequence in each genotype, equals: 



L 2 x (N 



h=i 



j2 = lj2^jl 
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Averaged over all individuals in the population, the mean phenotypic distance between any two 
individuals in the population equals: 



dp 



1 



L 2 x N(N - 1) 



n E n n E E 



jl=l Zl = l j 2 =l,j 2 ^=jl 



E ( d hh^hh J J + ( n ii ~ 1) E ( d hh^hh 
« 2 =i V 7 7 z 2 =i 
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